ncRNA orthologies in the vertebrate lineage

نویسندگان

  • Miguel Pignatelli
  • Albert J. Vilella
  • Matthieu Muffato
  • Leo Gordon
  • Simon White
  • Paul Flicek
  • Javier Herrero
چکیده

Annotation of orthologous and paralogous genes is necessary for many aspects of evolutionary analysis. Methods to infer these homology relationships have traditionally focused on protein-coding genes and evolutionary models used by these methods normally assume the positions in the protein evolve independently. However, as our appreciation for the roles of non-coding RNA genes has increased, consistently annotated sets of orthologous and paralogous ncRNA genes are increasingly needed. At the same time, methods such as PHASE or RAxML have implemented substitution models that consider pairs of sites to enable proper modelling of the loops and other features of RNA secondary structure. Here, we present a comprehensive analysis pipeline for the automatic detection of orthologues and paralogues for ncRNA genes. We focus on gene families represented in Rfam and for which a specific covariance model is provided. For each family ncRNA genes found in all Ensembl species are aligned using Infernal, and several trees are built using different substitution models. In parallel, a genomic alignment that includes the ncRNA genes and their flanking sequence regions is built with PRANK. This alignment is used to create two additional phylogenetic trees using the neighbour-joining (NJ) and maximum-likelihood (ML) methods. The trees arising from both the ncRNA and genomic alignments are merged using TreeBeST, which reconciles them with the species tree in order to identify speciation and duplication events. The final tree is used to infer the orthologues and paralogues following Fitch's definition. We also determine gene gain and loss events for each family using CAFE. All data are accessible through the Ensembl Comparative Genomics ('Compara') API, on our FTP site and are fully integrated in the Ensembl genome browser, where they can be accessed in a user-friendly manner. Database URL: http://www.ensembl.org.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Evf-2 noncoding RNA is transcribed from the Dlx-5/6 ultraconserved region and functions as a Dlx-2 transcriptional coactivator.

The identification of ultraconserved noncoding sequences in vertebrates has been associated with developmental regulators and DNA-binding proteins. One of the first of these was identified in the intergenic region between the Dlx-5 and Dlx-6 genes, members of the Dlx/dll homeodomain-containing protein family. In previous experiments, we showed that Sonic hedgehog treatment of forebrain neural e...

متن کامل

Intergenic Polycomb target sites are dynamically marked by non-coding transcription during lineage commitment

Non-coding (nc) RNAs are involved both in recruitment of vertebrate Polycomb (PcG) proteins to chromatin, and in activation of PcG target genes. Here we investigate dynamic changes in the relationship between ncRNA transcription and recruitment of PcG proteins to chromatin during differentiation. Profiling of purified cell populations from different stages of a defined murine in vitro neural di...

متن کامل

Systematic identification and characterization of chicken (Gallus gallus) ncRNAs

Recent studies have demonstrated that non-coding RNAs (ncRNAs) play important roles during development and evolution. Chicken, the first genome-sequenced non-mammalian amniote, possesses unique features for developmental and evolutionary studies. However, apart from microRNAs, information on chicken ncRNAs has mainly been obtained from computational predictions without experimental validation. ...

متن کامل

Phylogenetic analysis of the Wnt gene family and discovery of an arthropod wnt-10 orthologue.

Wnt genes encode a conserved family of secreted signaling proteins that play many roles in arthropod and vertebrate development. We have investigated both the phylogenetic history and molecular evolution of this gene family. We have identified a novel Wnt gene in a diversity of arthropods that it is likely an orthologue of the vertebrate Wnt-10 group. Wnt-10 is one of only two cases in which or...

متن کامل

O-44: Characterisation of Monotreme CaseinsReveals Lineage Specific Expansion of an AncestralCasein Locus in Mammals

Background: One important reproductive characteristic of Mammals is the production of milk to nurse the neonate. In order to better understand the evolution of milk we have investigated gene expression in milk cells from monotremes which are the most ancient representative of the mammalian lineage. Materials and Methods: Using a milk cell cDNA sequencing approach we characterise milk protein se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Database : the journal of biological databases and curation

دوره 2016  شماره 

صفحات  -

تاریخ انتشار 2016